{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "import os\n",
    "if not any(path.endswith('textbook') for path in sys.path):\n",
    "    sys.path.append(os.path.abspath('../../..'))\n",
    "from textbook_utils import *"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "dogs = pd.read_csv('dogs.csv')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(ch:sql)=\n",
    "# Working with Relations Using SQL"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In {ref}`ch:pandas`, we used dataframes to represent tables of data. This chapter introduces\n",
    "*relations*, another widely used way to represent data tables. We\n",
    "also introduce SQL, the standard programming language for working with\n",
    "relations. Here's an example of a relation that holds information about\n",
    "popular dog breeds:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>breed</th>\n",
       "      <th>grooming</th>\n",
       "      <th>food_cost</th>\n",
       "      <th>kids</th>\n",
       "      <th>size</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>Labrador Retriever</td>\n",
       "      <td>weekly</td>\n",
       "      <td>466.0</td>\n",
       "      <td>high</td>\n",
       "      <td>medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>German Shepherd</td>\n",
       "      <td>weekly</td>\n",
       "      <td>466.0</td>\n",
       "      <td>medium</td>\n",
       "      <td>large</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Beagle</td>\n",
       "      <td>daily</td>\n",
       "      <td>324.0</td>\n",
       "      <td>high</td>\n",
       "      <td>small</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Golden Retriever</td>\n",
       "      <td>weekly</td>\n",
       "      <td>466.0</td>\n",
       "      <td>high</td>\n",
       "      <td>medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Yorkshire Terrier</td>\n",
       "      <td>daily</td>\n",
       "      <td>324.0</td>\n",
       "      <td>low</td>\n",
       "      <td>small</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Bulldog</td>\n",
       "      <td>weekly</td>\n",
       "      <td>466.0</td>\n",
       "      <td>medium</td>\n",
       "      <td>medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Boxer</td>\n",
       "      <td>weekly</td>\n",
       "      <td>466.0</td>\n",
       "      <td>high</td>\n",
       "      <td>medium</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Jupyter doesn't have a built-in way to display relations, so we jiggle the\n",
    "# dataframe output a bit to make it look like a relation\n",
    "from IPython.display import display, HTML\n",
    "display(HTML(dogs.to_html(index=False)))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Like dataframes, each row in a relation represents a single record---in this case, a single\n",
    "dog breed. Each column represents a feature about the record---for example, the\n",
    "`grooming` column represents how often each dog breed needs to be groomed.\n",
    "\n",
    "Both relations and dataframes have labels for each column in the table. However, one key difference is that the rows in a relation don't have labels, while rows in a dataframe do."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this chapter, we demonstrate common relation operations using SQL.\n",
    "We start by explaining the structure of SQL queries. Then we show how to use\n",
    "SQL to perform common data manipulation tasks, like slicing, filtering,\n",
    "sorting, grouping, and joining.\n",
    "\n",
    ":::{note}\n",
    "\n",
    "This chapter replicates the data analyses in {ref}`ch:pandas` using\n",
    "relations and SQL instead of dataframes and Python. The datasets, data\n",
    "manipulations, and conclusions are nearly identical across the two chapters for ease of comparison between performing data manipulations in `pandas` and SQL.\n",
    "\n",
    ":::"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
